1 Introduction

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by a new type of coronavirus: severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The outbreak first started in Wuhan, China in December 2019. The first kown case of COVID-19 in the U.S. was confirmed on January 20, 2020, in a 35-year-old man who teturned to Washington State on January 15 after traveling to Wuhan. Starting around the end of Feburary, evidence emerge for community spread in the US.

We, as all of us, are indebted to the heros who fight COVID-19 across the whole world in different ways. For this data exploration, I am grateful to many data science groups who have collected detailed COVID-19 outbreak data, including the number of tests, confirmed cases, and deaths, across countries/regions, states/provnices (administrative division level 1, or admin1), and counties (admin2). Specifically, I used the data from these three resources:

2 JHU

Assume you have cloned the JHU Github repository on your local machine at ``../COVID-19’’.

2.1 time series data

The time series provide counts (e.g., confirmed cases, deaths) starting from Jan 22nd, 2020 for 253 locations. Currently there is no data of individual US state in these time series data files.

Here is the list of 10 records with the largest number of cases or deaths on the most recent date.

Next, I check for each country/region, what is the number of new cases/deaths? This data is important to understand what is the trend under different situations, e.g., population density, social distance policies etc. Here I checked the top 10 countries/regions with the highest number of deaths.

2.2 daily reports data

The raw data from Hopkins are in the format of daily reports with one file per day. More recent files (since March 22nd) inlcude information from individual states of US or individual counties, as shown in the following figure. So I turn to NY Times data for informatoin of individual states or counties.

3 NY Times

The data from NY Times are saved in two text files, one for state level information and the other one for county level information.

The currente date is

## [1] "2020-06-06"

3.1 state level data

First check the 30 states with the largest number of deaths.

##            date                state fips  cases deaths
## 5273 2020-06-06             New York   36 382102  30123
## 5271 2020-06-06           New Jersey   34 163893  12106
## 5262 2020-06-06        Massachusetts   25 103132   7289
## 5280 2020-06-06         Pennsylvania   42  79507   5986
## 5254 2020-06-06             Illinois   17 127251   5898
## 5263 2020-06-06             Michigan   26  64196   5894
## 5244 2020-06-06           California    6 129147   4626
## 5246 2020-06-06          Connecticut    9  43818   4055
## 5259 2020-06-06            Louisiana   22  42597   2925
## 5261 2020-06-06             Maryland   24  58099   2740
## 5249 2020-06-06              Florida   12  62750   2687
## 5277 2020-06-06                 Ohio   39  38111   2370
## 5255 2020-06-06              Indiana   18  37928   2292
## 5250 2020-06-06              Georgia   13  48943   2147
## 5286 2020-06-06                Texas   48  75077   1840
## 5245 2020-06-06             Colorado    8  27834   1527
## 5290 2020-06-06             Virginia   51  49397   1460
## 5264 2020-06-06            Minnesota   27  27512   1181
## 5291 2020-06-06           Washington   53  24486   1163
## 5242 2020-06-06              Arizona    4  25517   1046
## 5274 2020-06-06       North Carolina   37  34809   1020
## 5266 2020-06-06             Missouri   29  14659    823
## 5265 2020-06-06          Mississippi   28  17034    811
## 5282 2020-06-06         Rhode Island   44  15441    772
## 5240 2020-06-06              Alabama    1  20043    689
## 5293 2020-06-06            Wisconsin   55  20701    646
## 5256 2020-06-06                 Iowa   19  21527    602
## 5283 2020-06-06       South Carolina   45  13916    545
## 5248 2020-06-06 District of Columbia   11   9269    483
## 5258 2020-06-06             Kentucky   21  11359    480

For these 20 states, I check the number of new cases and the number of new deaths. Part of the reason for such checking is to identify whether there is any similarity on such patterns. For example, could you use the pattern seen from Italy to predict what happen in an individual state, and what are the similarities and differences across states.

Next I check the relation between the cumulative number of cases and deaths for these 10 states, starting on March

3.2 county level data

First check the 50 counties with the largest number of deaths.

##              date               county                state  fips  cases deaths
## 211368 2020-06-06        New York City             New York    NA 211274  21294
## 210192 2020-06-06                 Cook             Illinois 17031  81924   3913
## 211367 2020-06-06               Nassau             New York 36059  40853   2635
## 210878 2020-06-06                Wayne             Michigan 26163  21163   2627
## 209796 2020-06-06          Los Angeles           California  6037  62338   2620
## 211387 2020-06-06              Suffolk             New York 36103  40278   1970
## 210792 2020-06-06            Middlesex        Massachusetts 25017  22686   1701
## 211293 2020-06-06                Essex           New Jersey 34013  18066   1701
## 211288 2020-06-06               Bergen           New Jersey 34003  18492   1612
## 211395 2020-06-06          Westchester             New York 36119  33923   1523
## 211791 2020-06-06         Philadelphia         Pennsylvania 42101  23529   1414
## 209895 2020-06-06            Fairfield          Connecticut  9001  16020   1309
## 209896 2020-06-06             Hartford          Connecticut  9003  10747   1279
## 211295 2020-06-06               Hudson           New Jersey 34017  18548   1210
## 211306 2020-06-06                Union           New Jersey 34039  16116   1095
## 210859 2020-06-06              Oakland             Michigan 26125  10980   1055
## 211298 2020-06-06            Middlesex           New Jersey 34023  16203   1032
## 209899 2020-06-06            New Haven          Connecticut  9009  11817   1007
## 210788 2020-06-06                Essex        Massachusetts 25009  15170    998
## 211302 2020-06-06              Passaic           New Jersey 34031  16436    969
## 210796 2020-06-06              Suffolk        Massachusetts 25025  18955    923
## 210846 2020-06-06               Macomb             Michigan 26099   6940    870
## 210794 2020-06-06              Norfolk        Massachusetts 25021   8689    859
## 210798 2020-06-06            Worcester        Massachusetts 25027  11696    820
## 211301 2020-06-06                Ocean           New Jersey 34029   8979    767
## 209951 2020-06-06           Miami-Dade              Florida 12086  19298    765
## 211786 2020-06-06           Montgomery         Pennsylvania 42091   7542    724
## 210905 2020-06-06             Hennepin            Minnesota 27053   9255    667
## 210326 2020-06-06               Marion              Indiana 18097  10390    663
## 210774 2020-06-06           Montgomery             Maryland 24031  12662    652
## 211763 2020-06-06             Delaware         Pennsylvania 42045   6661    651
## 211299 2020-06-06             Monmouth           New Jersey 34025   8454    636
## 211300 2020-06-06               Morris           New Jersey 34027   6584    626
## 210790 2020-06-06              Hampden        Massachusetts 25013   6337    618
## 210775 2020-06-06      Prince George's             Maryland 24033  16838    595
## 210795 2020-06-06             Plymouth        Massachusetts 25023   8347    588
## 212436 2020-06-06                 King           Washington 53033   8419    578
## 211353 2020-06-06                 Erie             New York 36029   6429    547
## 211749 2020-06-06                Bucks         Pennsylvania 42017   5243    529
## 211812 2020-06-06           Providence         Rhode Island 44007  11052    518
## 210713 2020-06-06              Orleans            Louisiana 22071   7222    512
## 211297 2020-06-06               Mercer           New Jersey 34021   7148    500
## 209695 2020-06-06             Maricopa              Arizona  4013  12761    489
## 209908 2020-06-06 District of Columbia District of Columbia 11001   9269    483
## 210786 2020-06-06              Bristol        Massachusetts 25005   7635    467
## 211379 2020-06-06             Rockland             New York 36087  13315    465
## 211142 2020-06-06            St. Louis             Missouri 29189   5029    460
## 210703 2020-06-06            Jefferson            Louisiana 22051   7831    458
## 211304 2020-06-06             Somerset           New Jersey 34035   4664    425
## 212325 2020-06-06              Fairfax             Virginia 51059  12056    413

For these 50 counties, I check the number of new cases and the number of new deaths.

4 COVID Trackng

The positive rates of testing can be an indicator on how much the COVID-19 has spread. However, they are more noisy data since the negative testing resutls are often not reported and the tests are almost surely taken on a non-representative random sample of the population. The COVID traking project proides a grade per state: ``If you are calculating positive rates, it should only be with states that have an A grade. And be careful going back in time because almost all the states have changed their level of reporting at different times.’’ (https://covidtracking.com/about-tracker/). The data are also availalbe for both counties and states, here I only look at state level data.

Since the daily postive rate can fluctuate a lot, here I only illustrae the cumulative positave rate across time, for four states with grade A data. Of course since this is an R markdown file, you can modify the source code and check for other states.

5 Session information

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] httr_1.4.1    ggpubr_0.2.5  magrittr_1.5  ggplot2_3.3.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.3       pillar_1.4.3     compiler_3.6.2   tools_3.6.2     
##  [5] digest_0.6.23    lattice_0.20-38  nlme_3.1-144     evaluate_0.14   
##  [9] lifecycle_0.2.0  tibble_3.0.1     gtable_0.3.0     mgcv_1.8-31     
## [13] pkgconfig_2.0.3  rlang_0.4.6      Matrix_1.2-18    yaml_2.2.1      
## [17] xfun_0.12        gridExtra_2.3    withr_2.1.2      stringr_1.4.0   
## [21] dplyr_0.8.4      knitr_1.28       vctrs_0.3.0      cowplot_1.0.0   
## [25] grid_3.6.2       tidyselect_1.0.0 glue_1.3.1       R6_2.4.1        
## [29] rmarkdown_2.1    purrr_0.3.3      farver_2.0.3     splines_3.6.2   
## [33] scales_1.1.0     ellipsis_0.3.0   htmltools_0.4.0  assertthat_0.2.1
## [37] colorspace_1.4-1 ggsignif_0.6.0   labeling_0.3     stringi_1.4.5   
## [41] munsell_0.5.0    crayon_1.3.4